Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice
نویسندگان
چکیده
Phylogenetic studies based on molecular sequence alignments are expected to become more accurate as the number of sites in the alignments increases. With the advent of genomic-scale data, where alignments have very large numbers of sites, bootstrap values close to 100% and posterior probabilities close to 1 are the norm, suggesting that the number of sites is now seldom a limiting factor on phylogenetic accuracy. This provokes the question, should we be fussy about the sites we choose to include in a genomic-scale phylogenetic analysis? If some sites contain missing data, ambiguous character states, or gaps, then why not just throw them away before conducting the phylogenetic analysis? Indeed, this is exactly the approach taken in many phylogenetic studies. Here, we present an example where the decision on how to treat sites with missing data is of equal importance to decisions on taxon sampling and model choice, and we introduce a graphical method for illustrating this.
منابع مشابه
The Impact of Outgroup Choice and Missing Data on Major Seed Plant Phylogenetics Using Genome-Wide EST Data
BACKGROUND Genome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become available. In this work, we exploit the burgeoning genomic data being generated for plant genomes...
متن کاملOn Calibration and Application of Logit-Based Stochastic Traffic Assignment Models
There is a growing recognition that discrete choice models are capable of providing a more realistic picture of route choice behavior. In particular, influential factors other than travel time that are found to affect the choice of route trigger the application of random utility models in the route choice literature. This paper focuses on path-based, logit-type stochastic route choice models, i...
متن کاملThe phylogenetic position of Myxozoa: exploring conflicting signals in phylogenomic and ribosomal data sets.
Myxozoans are a diverse group of microscopic endoparasites that have been the focus of much controversy regarding their phylogenetic position. Two dramatically different hypotheses have been put forward regarding the placement of Myxozoa within Metazoa. One hypothesis, supported by ribosomal DNA (rDNA) data, place Myxozoa as a sister taxon to Bilateria. The alternative hypothesis, supported by ...
متن کاملDevelopment of an Improved Fuzzy Approach to Model Potential Sites for Groundwater Artificial Recharge
Delineation of potential sites for groundwater artificial recharge is an important and challenging task. The purpose of this research is to develop a new data-driven fuzzy approach to model potential sites for groundwater artificial recharge. To achieve this end, the efficient criteria of a proper site for groundwater artificial recharge were first recognized and presented as a conceptual model...
متن کاملA New Model Selection Test with Application to the Censored Data of Carbon Nanotubes Coating
Model selection of nano and micro droplet spreading can be widely used to predict and optimize of different coating processes such as ink jet printing, spray painting and plasma spraying. The idea of model selection is beginning with a set of data and rival models to choice the best one. The decision making on this set is an important question in statistical inference. Some tests and criteria a...
متن کامل